Kubernetes 1.31: the stabilisations that matter day to day

Kubernetes 1.31 shipped on 13 August 2024 and is probably one of the calmest releases of the last two years. No flashy headline, no change that forces a manifest rewrite, no redesigned API client. And that is exactly why it deserves a careful reading: when a release does not push loud novelties, it is usually closing old debts. 1.31 closes several, and some of them change the day to day of cluster operators far more than their announcements suggest.

AppArmor is no longer an annotation

For years, applying an AppArmor profile to a pod meant a metadata annotation, a convention inherited from the past that carried every problem annotations have: no schema validation, no discovery via kubectl explain, easy to forget in a Helm chart copy-paste. In 1.31 AppArmor reaches GA as a first-class field inside securityContext.

apiVersion: v1
kind: Pod
metadata:
  name: hardened-web
spec:
  securityContext:
    appArmorProfile:
      type: Localhost
      localhostProfile: restricted-web
  containers:
    - name: app
      image: nginx:1.27

The change looks cosmetic, but the consequences are practical: admission webhooks can validate the profile as part of the schema, policy engines (OPA/Gatekeeper, Kyverno) get a canonical path to inspect, and compliance teams stop depending on annotations that get lost in rebases. For anyone already hardening nodes with AppArmor, this consolidates a layer that used to be fragile. It also makes it realistic to enforce a cluster-wide baseline profile without writing a mutating webhook to plant the right annotation on every pod.

Real sidecars, at last

The most awaited stabilisation. Until 1.29 the sidecar pattern was, quite literally, a hack: one more container in the pod, with no guarantees about start order relative to the main container, no termination order, and the classic problem where the proxy died before the application and blocked clean shutdown. The community answer was a mix of preStop scripts, shareProcessNamespace, and service-mesh-specific workarounds.

1.31 marks as stable the sidecars declared as initContainers with restartPolicy: Always. They start before the main container, restart independently if they fail, and terminate after the main one finishes. That single change resolves three chronic pains: service meshes that blocked Jobs because the sidecar stayed alive, log collectors that lost the final bytes before shutdown, and security proxies that came up late and left a window uncovered.

The operational impact is enormous for anyone running Istio, Linkerd, Vector or any sidecar-style collector. Manifests get simpler, Jobs stop hanging, and a whole class of night-shift incidents disappears. It also changes how teams think about what belongs in a pod: with lifecycle actually modelled, it becomes reasonable to co-locate short-lived helpers (token refreshers, cache warmers, metric relays) that used to be awkward enough that people pushed them out into separate deployments.

DRA: beta, but with intent

Dynamic Resource Allocation enters beta. It replaces the old device-plugins model for GPUs, FPGAs, and miscellaneous accelerators with a claim-based abstraction, much like persistent volumes work. A pod declares what it needs, a driver resolves it, the scheduler reconciles. For single-tenant setups with one GPU per node, device plugins still worked fine; but as soon as you entered sharing, GPU time-slicing, or heterogeneous pools, the old model fell short.

In 1.31 it is not yet production-ready without caveats, but anyone planning an ML or inference platform should be testing it in staging already: the jump to GA in 1.32 or 1.33 will force architectural decisions and it is better to arrive with the shape of the API already internalised.

What’s gone

The in-tree drivers for CephFS and CephRBD are removed, and in-tree gce_pd is deprecated. If anyone is still using them, migrating to the equivalent CSI drivers is a prerequisite before the upgrade. This kind of removal is the one that produces surprises: the cluster starts up fine, but old PVCs are left orphaned and nobody notices until a pod retries an attach. The safe workflow is a kubectl get storageclass reviewing provisioner, cross-referenced with live PVs, before touching the control plane.

The scheduler, quieter

Scheduler improvements in 1.31 are incremental (5 to 10 percent throughput in the project’s own benchmarks), but they show in two specific scenarios: large clusters with many Pending pods waiting for room, and workloads with active preemption where eviction decisions used to be erratic. Not enough to justify an upgrade on its own, but it does explain why clusters close to saturation feel smoother after updating.

An honest upgrade checklist

Before jumping from 1.29 or 1.30 six things should be closed: a clean deprecation report on the current version, a recent etcd backup verified with etcdctl snapshot status, node drains that respect the declared PDBs (and a check that the PDBs are actually satisfiable), CSI drivers in line with current compatibility matrices, a real test of any service mesh that will migrate to the new sidecar model, and a canary with at least one node running the new kubelet before rolling it across the rest.

The upgrade itself, via kubeadm, holds no mystery: kubeadm upgrade plan, kubeadm upgrade apply v1.31.x, and then kubelet node by node, respecting the drains. What hurts is not the command; it is having skipped the six points above.

Kubernetes 1.31 will not appear in any event keynote, and that’s fine. It is a hygiene release: it pulls AppArmor out of the annotation limbo, turns the sidecar pattern into something the orchestrator actually understands, and prepares the ground for DRA to stabilise without being rushed. For anyone running clusters in production, the question is not whether it is worth upgrading, but how soon it can be done before the gap with upstream widens and every further jump starts accumulating its own debt.

AppArmor is no longer an annotation

Real sidecars, at last

DRA: beta, but with intent

What’s gone

The scheduler, quieter

An honest upgrade checklist

Entradas relacionadas

Microsoft’s GraphRAG in enterprise: patterns that work

WASI preview 3: threads and async in WebAssembly

Llama 3.2 at the edge: Meta bets on small